Goto

Collaborating Authors

 gaze estimation


SLYKLatent: A Learning Framework for Gaze Estimation Using Deep Facial Feature Learning

Adebayo, Samuel, Dessing, Joost C., McLoone, Seán

arXiv.org Artificial Intelligence

In this research, we present SLYKLatent, a novel approach for enhancing gaze estimation by addressing appearance instability challenges in datasets due to aleatoric uncertainties, covariant shifts, and test domain generalization. SLYKLatent utilizes Self-Supervised Learning for initial training with facial expression datasets, followed by refinement with a patch-based tri-branch network and an inverse explained variance-weighted training loss function. Our evaluation on benchmark datasets achieves a 10.9% improvement on Gaze360, supersedes top MPIIFaceGaze results with 3.8%, and leads on a subset of ETH-XGaze by 11.6%, surpassing existing methods by significant margins. Adaptability tests on RAF-DB and Affectnet show 86.4% and 60.9% accuracies, respectively. Ablation studies confirm the effectiveness of SLYKLatent's novel components.


WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization

Davalos, Eduardo, Zhang, Yike, Srivastava, Namrata, Thatigotla, Yashvitha, Salas, Jorge A., McFadden, Sara, Cho, Sun-Joo, Goodwin, Amanda, TS, Ashwin, Biswas, Gautam

arXiv.org Artificial Intelligence

With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world application reveals a gap with commercial eye-tracking solutions. Factors like model size, inference time, and privacy often go unaddressed. Meanwhile, webcam-based eye-tracking methods lack sufficient accuracy, in particular due to head movement. To tackle these issues, we introduce We bEyeTrack, a framework that integrates lightweight SOTA gaze estimation models directly in the browser. It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples (k < 9). WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14. Our open-source code is available at https://github.com/RedForestAi/WebEyeTrack.


DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention

Chen, Haohan, Liu, Hongjia, Lan, Shiyong, Wang, Wenwu, Qiao, Yixin, Li, Yao, Deng, Guonan

arXiv.org Artificial Intelligence

Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose estimation features, to improve overall performance. Furthermore, we introduce a new cascaded attention module named Multi-Scale Global Local Attention Module (MS-GLAM). Through a customized cascaded attention structure, it e ffectively focuses on global and local information at multiple scales, further enhancing the information from the Disentangler. Finally, the global gaze-relevant features disentangled by the upper face branch, combined with head pose and local eye features, are passed through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been extensively validated on two mainstream public datasets, achieving state-of-the-art performance. Keywords: gaze estimation, feature disentanglement, Gaussian similarity, multi-scale attention1. Introduction Gaze estimation, the task of predicting gaze direction, crucial for measuring human attention, is widely applied in areas like saliency detection[1, 2], virtual reality[3], driver distraction monitoring[4], human-computer interaction[5] and autism diagnosis[6]. Recently, gaze estimation has shifted from model-based methods to appearance-based methods.


Gaze estimation learning architecture as support to affective, social and cognitive studies in natural human-robot interaction

Lombardi, Maria, Maiettini, Elisa, Wykowska, Agnieszka, Natale, Lorenzo

arXiv.org Artificial Intelligence

Gaze is a crucial social cue in any interacting scenario and drives many mechanisms of social cognition (joint and shared attention, predicting human intention, coordination tasks). Gaze direction is an indication of social and emotional functions affecting the way the emotions are perceived. Evidence shows that embodied humanoid robots endowing social abilities can be seen as sophisticated stimuli to unravel many mechanisms of human social cognition while increasing engagement and ecological validity. In this context, building a robotic perception system to automatically estimate the human gaze only relying on robot's sensors is still demanding. Main goal of the paper is to propose a learning robotic architecture estimating the human gaze direction in table-top scenarios without any external hardware. Table-top tasks are largely used in many studies in experimental psychology because they are suitable to implement numerous scenarios allowing agents to collaborate while maintaining a face-to-face interaction. Such an architecture can provide a valuable support in studies where external hardware might represent an obstacle to spontaneous human behaviour, especially in environments less controlled than the laboratory (e.g., in clinical settings). A novel dataset was also collected with the humanoid robot iCub, including images annotated from 24 participants in different gaze conditions.


Cross-Dataset Gaze Estimation by Evidential Inter-intra Fusion

Wang, Shijing, Huang, Yaping, Xie, Jun, YiTian, null, Chen, Feng, Wang, Zhepeng

arXiv.org Artificial Intelligence

Achieving accurate and reliable gaze predictions in complex and diverse environments remains challenging. Fortunately, it is straightforward to access diverse gaze datasets in real-world applications. We discover that training these datasets jointly can significantly improve the generalization of gaze estimation, which is overlooked in previous works. However, due to the inherent distribution shift across different datasets, simply mixing multiple dataset decreases the performance in the original domain despite gaining better generalization abilities. To address the problem of ``cross-dataset gaze estimation'', we propose a novel Evidential Inter-intra Fusion EIF framework, for training a cross-dataset model that performs well across all source and unseen domains. Specifically, we build independent single-dataset branches for various datasets where the data space is partitioned into overlapping subspaces within each dataset for local regression, and further create a cross-dataset branch to integrate the generalizable features from single-dataset branches. Furthermore, evidential regressors based on the Normal and Inverse-Gamma (NIG) distribution are designed to additionally provide uncertainty estimation apart from predicting gaze. Building upon this foundation, our proposed framework achieves both intra-evidential fusion among multiple local regressors within each dataset and inter-evidential fusion among multiple branches by Mixture \textbfof Normal Inverse-Gamma (MoNIG distribution. Experiments demonstrate that our method consistently achieves notable improvements in both source domains and unseen domains.


Gaze Estimation on Spresense

Ruegg, Thomas, Bonazzi, Pietro, Ronco, Andrea

arXiv.org Artificial Intelligence

Gaze estimation is a valuable technology with numerous applications in fields such as human-computer interaction, virtual reality, and medicine. This report presents the implementation of a gaze estimation system using the Sony Spresense microcontroller board and explores its performance in latency, MAC/cycle, and power consumption. The report also provides insights into the system's architecture, including the gaze estimation model used. Additionally, a demonstration of the system is presented, showcasing its functionality and performance. Our lightweight model TinyTrackerS is a mere 169Kb in size, using 85.8k parameters and runs on the Spresense platform at 3 FPS.


Semi-Synthetic Dataset Augmentation for Application-Specific Gaze Estimation

Leblond-Menard, Cedric, Picard-Krashevski, Gabriel, Achiche, Sofiane

arXiv.org Artificial Intelligence

Although the number of gaze estimation datasets is growing, the application of appearance-based gaze estimation methods is mostly limited to estimating the point of gaze on a screen. This is in part because most datasets are generated in a similar fashion, where the gaze target is on a screen close to camera's origin. In other applications such as assistive robotics or marketing research, the 3D point of gaze might not be close to the camera's origin, meaning models trained on current datasets do not generalize well to these tasks. We therefore suggest generating a textured tridimensional mesh of the face and rendering the training images from a virtual camera at a specific position and orientation related to the application as a mean of augmenting the existing datasets. In our tests, this lead to an average 47% decrease in gaze estimation angular error.


Contrastive Representation Learning for Gaze Estimation

Jindal, Swati, Manduchi, Roberto

arXiv.org Artificial Intelligence

Self-supervised learning (SSL) has become prevalent for learning representations in computer vision. Notably, SSL exploits contrastive learning to encourage visual representations to be invariant under various image transformations. The task of gaze estimation, on the other hand, demands not just invariance to various appearances but also equivariance to the geometric transformations. In this work, we propose a simple contrastive representation learning framework for gaze estimation, named Gaze Contrastive Learning (GazeCLR). GazeCLR exploits multi-view data to promote equivariance and relies on selected data augmentation techniques that do not alter gaze directions for invariance learning. Our experiments demonstrate the effectiveness of GazeCLR for several settings of the gaze estimation task. Particularly, our results show that GazeCLR improves the performance of cross-domain gaze estimation and yields as high as 17.2% relative improvement. Moreover, the GazeCLR framework is competitive with state-of-the-art representation learning methods for few-shot evaluation. The code and pre-trained models are available at https://github.com/jswati31/gazeclr.


Eye Gaze Estimation Model Analysis

Kottwani, Aveena, Kumar, Ayush

arXiv.org Artificial Intelligence

We explore techniques for eye gaze estimation using machine learning. Eye gaze estimation is a common problem for various behavior analysis and human-computer interfaces. The purpose of this work is to discuss various model types for eye gaze estimation and present the results from predicting gaze direction using eye landmarks in unconstrained settings. In unconstrained real-world settings, feature-based and model-based methods are outperformed by recent appearance-based methods due to factors like illumination changes and other visual artifacts. We discuss a learning-based method for eye region landmark localization trained exclusively on synthetic data. We discuss how to use detected landmarks as input to iterative model-fitting and lightweight learning-based gaze estimation methods and how to use the model for person-independent and personalized gaze estimations.